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, A methodology is developed for the adjustment of the covariance matrices underlying a multivariate 

constant time series dynamic linear model. The covariance matrices are embedded in a distribution-free 
inner-product space of matrix objects which facilitates such adjustment. This approach helps to make 



the analysis simple, tractable and robust. To illustrate the methods, a simple model is developed for a 



time series representing sales of certain brands of a product from a cash-and-carry depot. The covariance 



structure underlying the model is revised, and the benefits of this revision on first order inferences are 
then examined. 
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1 Introduction 



Bayesian covariance matrix estimation is a notoriously difficult problem. For large matrices in particular, 
specification of beliefs about the distribution of the matrix is particularly hard. Distributional Bayesian 
approaches to the problem have tended to make use of restrictive conjugate prior assumptions in order to fix 
the prior distribution with as few hyper-parameters as possible. More recently, Leonard and Hsu (1992| ) have 



weakened the distributional assumptions required. However, specification within their formulation is very 
difficult, and, for large matrices, the computational problems are considerable. Specification is somewhat 



easier using the approach of Brown, Le, and Zidek (1994), but the other problems remain. In Wilkinson 



and Goldstein (1995 ), we outline a new approach to covariance matrix adjustment, exploiting second order 



exchangeability specifications, and a geometric space where random matrices live naturally. 
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Covariance matrix adjustment for dynamic linear models is reviewed in West and Harrison (1989| ). For 



multivariate time series, the observational covariance matrix can be updated for a class of models known 
as matrix normal models using a simple conjugate prior approach. However, the distributional assumptions 
required are extremely restrictive, and it is difficult to learn about the covariance matrix for the updating 
of the state vector. 

In this paper, we apply our approach to develop a methodology for the revision of the underlying co- 
variance structures for a dynamic linear model, free from any distributional restrictions, using Bayes linear 
estimators for the covariance matrices based on simple quadratic observables. We do this by constructing an 
inner-product space of random matrices containing both the underlying covariance matrices and observables 
predictive for them. Bayes linear estimates for the underlying matrices follow by orthogonal projection. 

We illustrate the method with data derived from the weekly sales of six leading brands of shampoo from 
a medium sized cash-and-carry depot. The sales are modelled taking into account the underlying demand 
and competition effects, and the covariance structure over the resulting dynamic linear model is adjusted 
using the weekly sales data. 

2 The dynamic linear model 

2.1 The general model 

Let Xi, X 2 , ... be an infinite sequence of random vectors, each of length r, such that X t = (Xi t , X% t , . . . , JT rt ) T . 
These vectors represent the observations at each time point. Suppose that we model the relationships between 
these vectors in the following way. 

X t = F T & t + is t (1) 
t = GBt^+ut (2) 

Our prior second-order specification is as follows: 

E(i/ t ) =E(w t ) =O,Var(0 o ) = E,Var(i/ t ) = V,Var(u>t) = W, Vt (3) 

Cov(0 s , v t ) = Cov(0 s , u t ) = Cov(v s ,u t ) = Vs, t, Gov(uj s ,u t ) = Cov(v s , u t ) = Vs ^ t (4) 

where we adopt the usual conventions Cov(A, B) = E(AB T ) - E(A)E(B T ) and Var(A) = Cov(A, A). The 
state vector , & t is p dimensional, and the p x r and p x p dimensional matrices, F and G are assumed 
to be known. This is a second-order description of the (constant) multivariate time series dynamic linear 



model (DLM) described in West and Harrison (198S). We make no distributional assumptions for any of 



the components in the model. In this paper, we describe ways to learn about V and W from data. West 
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and Harrison (1989) (Chapter 15) give a conjugate prior solution to the problem of learning about V for a 
class of these models known as matrix normal models, if one is prepared to make the necessary distributional 
assumptions. However methods for learning about the matrix W tend primarily to be ad hoc. 



2.2 Example 

To illustrate our approach, we consider a simple locally constant model for the sales of 6 leading brands 
of shampoo from a medium sized cash-and-carry depot. As above Xi, X.2, ... is our sequence of random 
vectors, each of length 6, such that X t — (Xi t ,X2t, ■ ■ ■ , XQ t ) T . The component Xu represents the (unknown) 
sales of brand i at time t. The vectors of sales are modelled as follows 

X t = ® t + v f Vt (5) 

where 

© t = e t _ 1 +w t Vt (6) 

Prior beliefs are are given by (|J) and (|J). Here we are assuming the process to be locally constant, but 
with different underlying demands for each of the components of the scries. This is a simple model, with no 
seasonal component, chosen to illustrate our methodology, and would be unrealistic if there were noticeable 
trends within any of the components of the series. However, for high dimensional time series with no obvious 
trends, it is often the case that, provided the covariance structure is appropriate, many of the interesting 
features of the series can be captured using just such a model. To this end we introduce covariances between 
components of the state vector and also for the way demand changes over time, and for the way observations 
vary from the underlying demand. A more detailed treatment of multivariate sales forecasting within a fully 



specified Bayesian framework is given by Queen, Smith, and James (1994) and Queen (1994) who consider 



the problem of developing a dynamic model for multivariate sales, and the development of a prior distribution 
with sufficient flexibility to capture the effects of market interaction. 

The second-order DLM requires the following quantifications. Firstly, the F and G matrices must be 
specified. Then, a priori specifications arc needed for the expectation of the initial state vector, /x = E(©o). 
Finally, we must specify the matrices E = Var(@o), V — Var(i/ t ), W = Var(u; t )Vi. 

In our example the specification for the mean vector was 

E(© ) = (10,9,9,8,8,7) T (7) 

The following specifications were made for the covariance matrices, using exchangeability judgements con- 
cerning way the observations vary from their means. Using the notation Aij for the (i,j) th element of the 
matrix A, we have 

Sjj = 9 Vi, = 3 Vi ^ j, (8) 
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W u = 4 Vi, = 1 V^ ^ .? (9) 
V« = 36 Vi, Vij = -4 Vz ^ j (10) 

In truth, there is perhaps more symmetry in these specifications than is really appropriate, but specification 
is hard, and viewing variation in the sales of the various shampoos as second-order exchangeable greatly 
reduces the number of specifications which we have to make over the second order structure, and will allow 
further exchangeability modelling to simplify the fourth order specifications in later sections. 

Notice however that many aspects of the underlying mechanisms have been captured by these specifi- 
cations. In this model, 0* represents the vector of demands at time t. From the positive correlations in 
Var(©o), if the mean of one product turned out to be higher than anticipated, we would revise upwards 
beliefs about the means of the other products. Also, the positive correlations within Var(o?t) indicate that 
there is a common component to the demands, whilst the negative correlations within Var(i/ t ) indicate that 
brands are competing, and tend to succeed at the expense of each other. 

2.3 Bayes linear analysis 

We take a Bayes linear approach to subjective statistical inference, making expectation (rather than prob- 
ability) primitive. An overview of the methodology is given in Farrow and Goldstein (1993| ), in which the 



emphasis is on learning about means. With the second-order specification that we have made, we may use 
sales data to carry out an analagous Bayes linear analysis which will be informative for the mean of future 
observations. However, we will not learn about the covariance matrices W = Var(u; t ) or V = Var(f t ). In 
this paper, we describe how such learning may take place. 

3 Quadratic products 

3.1 Exchangeable decomposition of unobservable products 

For the matrix A = (a 1; a 2 , . . . , a n ), we define 

vecA = (ai T ,a 2 T , . . . ,o„ T ) T (11) 



For the general DLM outlined in Section 2.1, we form the quadratic products of ca t and v tl namely 
vec(uJt^t T ) and •vec(vtVt T )- We view vec(u>tu;t T ) and vec^i/tvi 1 ) to be second-order exchangeable over 
t. By this, we mean that our second-order beliefs over the vectors of quadratic products of residuals will 
remain invariant under the action of an arbitrary permutation of the t index (this is what we mean when 
describing a DLM as constant). From the second-order exchangeability representation theorem flGoldstciri 



1986 ), we may represent an element of a second-order exchangeable collection of vectors as the sum of a 
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mean vector, common to all elements, and a residual vector, uncorrelated with the mean vector and all other 
residual vectors. We may apply this representation to vec(u; f a; t T ), and then re-write the representation in 
matrix form as 

u; t u; t T =V U + S? Vt > 1 (12) 

where and Sf are random matrices of the same dimension as uitUt 1 ', E(vec(5^)) = and Cov(vec(V rw ), 
vec(S?)) = Cov(vec(5 i Ld ),vec(S'^)) = 0, Vs ^ t, Var(vecS^) = Var(vec5 4 "), Vs,t Decomposing vec(z^iv t T ) 
similarly, we obtain 

jy^ t T = V v + S% Vi > 1 (13) 

with properties as for representation ([l2|). Note that ~E(V") = E(o; t o; t T ) = Var(o; f ) = W and so learning 
about V w will allow us to learn about the covariance matrix for the residuals for the state, and E(V") — 
E(i> t i/ f T ) = Var(i/ t ) = V, and so learning about V v will allow us to learn about the covariance matrix for the 
observational residuals. Representations ( |l2] ) and ( |l3| ) decompose our uncertainty about w f u> f T and v t v^ 
into two parts. Bayes linear updating (with enough data) will eliminate the aspects of uncertainty derived 
from uncertainty about V u and V . 

In order to conduct a Bayes linear analysis on the quadratic structure we need additional covariance 
specifications Var(vecl/ W ), VarfvecV"), Var(vecS'") and Var(vecSf), for some t. 

3.2 Example 

In our example the X t vector is 6-dimensional, and so the matrices, V u , V v ', S u and S v are 6 x 6-dimensional. 
Consequently, the matrices Var(vecV^), Var(vecV^), Var(vec5") and Var(vec5^) are 36 x 36-dimensional. 
When referring to the components of Var(vecV^), the notation v^ kl will be used to denote the covariance 
between the (i,j) th and (k,l) th elements of V u . Similar notation is used for Var(vecV™). Also s^ kl and 
s"j kl are used for the components of Var(vecS^) and Var(vec5 t l/ ) respectively. The following covariance 
specifications were made for our example: 

v? m = 9/4, Vi, = 9/16, Vz ? j, = 1/5, Vi j, (14) 

w«it = 25, Vi, v v m = 1, Vi ^ j, vf^ = 4, Vi ^ (15) 

«S« = 30, V*, = 15, Vz # j, <^ = 2500, Vi, = 1000, Vz ^ j. (16) 

For instance, vf iu is the variance specification for the (i,i) th element of V w , which represents the underlying 
variance of the i th element of w f . From (^J), it has expectation 4. From this value governs the rate of 
change of ®t- By considering the range of plausible variances for the way &t might change over time, it was 
felt reasonable that a standard deviation specification of 3/2 should be made. The other specifications were 
made in a similar fashion. For simplicity in this example, the specifications for s^ kl and s^ ki were made 
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using the fourth moments of the multivariate normal distribution compatible with the given second-order 
structure as a guide. 

While we would require considerably more specifications for a full Bayes or Bayes linear analysis, (p"4|), 
(p"5|), and (|l6|) are sufficient for our purposes as these are the only specifications needed for the matrix object 
approach to belief revision which we are to take in the later sections. 

There is a lot of symmetry in these values, greatly simplifying the specification, but once again, any 
non-negative covariance structure over the quadratic products is acceptable. Here we have used assumptions 
of exchangeability over the variances and the covariances. Many of the specifications made for the quadratic 
structure will be "averaged over" in the matrix object approach to covariance adjustment which we shall 
develop, and so there is a limit to the effort that we may wish to put into very detailed specifications at this 
stage since our suggested analysis will not be overly sensitive to the individual specifications. 

3.3 Observable quadratic terms 

We will first construct certain linear combinations of the observables which do not involve the state vector, 
t . This is useful for various reasons, and in particular because it greatly reduces the prior specification 
required for the analysis of the quadratic structure. In this paper, we shall mainly be concerned with DLMs 
for which there exists anrxr matrix H, such that HF T = F T G. We call such DLMs two-step invertible. 
Note that a DLM will be two-step invertible if F is of full rank and r > p (as will often be the case 
for high-dimensional time series), and there will often be many matrices H satisfying HF T = F T G. For 
example, H = F T G T (where F T ^ represents any generalised inverse of F T ) is a solution. Further, if 
F is of full rank, r < p and such a matrix exists, then H = F T GF(F T F) -1 , and so H exists, if and only 
if F T GF(F T F)~ 1 F T = F T G. Note also that the matrix H 2 has the property that H 2 F T = F T G 2 . For 
a two-step invertible DLM, we may construct the following vectors of observables which do not involve the 
state vector: 

X' t = X t - HX t -i = F T u; t + v t - Hvt-i Vt > 2 (17) 
X" = X t - H 2 X t ^ 2 = F T uj t + F T Gu t -x +v t - H 2 v t - 2 Vi > 3 (18) 

We form the matrices of quadratic products, X' t X' t T and X"X" T Vi. As we shall see, these are predictive 
for V u and V v . 



Not all DLMs are two-step invertible, but for the constant dynamic linear model outlined in Section 2.1 
it is always possible to construct linear combinations of the observations which do not involve the state, 
provided only that the constant dynamic linear model in question is observable (for a discussion of the very 



weak restriction of observability, see West and Harrison (1989p , Chapter 5). However, in general such linear 



combinations require more than two successive observations from the series. Consequently, for simplicity 
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we restrict attention to the two-step invertible model. However, the approach is quite general, and may be 
applied similarly for any constant observable DLM, the only difference being that the covariance specification 
is more complicated, and more quantities are involved in the adjustment. 

3.4 Example 

For our example, F, G and H are all the identity, and so we form the one and two-step differences of the 
observables: 

X^ = X t -X t _i = w t +i/ t -i/ t _i Vi>2 (19) 
X[ 2) = X t - X t -2 = u>t + u t -i + u t - v t -2 Vi>3 (20) 

We then form the quadratic products of these, X^X^ T and X^ X^ T . These observables are predictive 
for V" and V v and so may be used to learn about the underlying covariance structure. All means and 
covariances that we require for the subsequent analysis are determined by specifications in (Q), (g), (^), (|l0|), 
(|l4| ) , (|l5|) and ([l(]) . The precise form of the covariance structure over the observables is rather complex, and 
is given in the appendix. 



4 n-step exchangeability 

4.1 n-step exchangeable collections 

The covariance structure over X t , X'[ and their quadratic products has a general structure common amongst 
ordered vectors of random quantities of the type arising from differenced time series. The covariance structure 
is invariant under arbitrary translations and reflections of the ordering, and the auto-correlation function 
becomes constant after some distance, n. We call ordered vectors with this property, second-order n-step 
exchangeable. Covariance may be interpreted as an inner-product on a space of random quantities. We 
will find it useful to consider inner-products on spaces of more general random entities. In particular, in 



Section 5.1 we shall define an inner-product on a space of random matrices. We require a concept of n-step 
exchangeability which is sufficiently general that it is also valid for this space of matrices, and so we formalise 
the concept as follows. 

Let {Yjk\yj, k} be a collection of random entities of interest to us. Also form a maximal linearly inde- 
pendent collection of constant entities of the same type, and call this collection C = [C%, C2, ■ ■ ■}■ When we 



are dealing with random scalars, C will consist of the single scalar, C\ — 1. In Section 5.1, we describe the 
constant space for a collection of random matrices. Form the vector space 

V = span{C j ,Y jk \Vj,k} (21) 
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so that the random entities are now vectors within this space. Define an inner-product (■,■): V x V — > R 
on V. The inner-product should capture certain aspects of our beliefs about the relationships between the 
elements of V; for scalars, we might use (X,Y) = ~E(XY). Form the completion of the space V, and denote 
this Hilbert space by H. Also define a bounded linear expectation function on the elements of V. For 
the purposes of this paper it suffices to think in terms of the usual definition of expectation. In general, 
however, we define a bounded linear function E(-) : H — ► span{C}, such that VY" S TL. E(Y~) is the 
orthogonal projection of Y into span{C}, with respect to the inner-product (•, •). This is the generalisation 
of expectation which we require for general spaces, and is defined in terms of the inner-product. It coincides 
with the usual definitions for all of the examples in this paper. 
If 3n G N such that E(Y jk ) = ej Vj, k, and 

(Y ik ,Yji) = d Qij \k - l\ = 

(Y^Yjt) = d Uj Wi,j,\k-l\ = l 

(Y ik ,Yji) = d( n _i)ij Vi,j, \k - l\ = n - 1 

(Y^Yjt) = en Vi,j,\k-l\>n (22) 

the collection {Yjk\ij, k} is said to be generalised second-order n-step exchangeable over k. 

We apply this definition to the elements, X* = {X< t X' jt \\/i,j,Vt > 2} and X** = {X'^X'f t \ Vi, j, Vt > 3}, 
of the matrices {X' t X' t T \it > 2} and {X"X" T |Vt > 3} defined in Section 
component of the vector X' t . Form a vector space, V consisting of all linear combinations of the elements of 
X* and X** and the unit constant, and define the inner-product on this space as (X, Y) = Fi(XY), VX, Y 6 
V. We may easily check that X* is (second order) 2-step exchangeable over t, and that X** is 3-step 
exchangeable over t. 

4.2 Representation for n-step exchangeable collections 

Goldstein (1986) constructs a general representation for second-order exchangeable collections. There is an 
analagous representation for collections with the weaker property of n-step exchangeability, constructed in 
a similar way. 

Theorem 1 Let {Yjfe|Vj, k} be generalised second-order n-step exchangeable over k with respect to the inner- 
product (•, •). Then the Yjk may be represented as 

Y jk =Mj+R jk Vj,fc (23) 

where the Mj and Rjk have the following properties: 

E(Y jk ) = E(Mj), E(Rjk) = Vj,k (24) 



3.3, where X' it denotes the 
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(M u Mj) = (Mi,Yjk) — Cij, (M h R jk )=0 Vi,j,k (25) 
(Rik,Rji) = (Y ik ,Rji) = (Y ik ,Yji) -cij Mi,j,k,l (26) 

Further, the {-RjfclVj, k} are generalised second- order n- step exchangeable overk, with {Rik, Rji) = Vz, j, \ k- 
l\ > n. 

Proof 
Let 

^ m 

M im = -Y^Y lk V*,m (27) 

TO ' 
fc=l 

Observe that the sequence Mu,Mi2, ... is Cauchy Vi ie. that (M^ — Mu,Mi k — Mu) — ► as k,l — > oo, 
which follows directly from the properties of n-step exchangeable sequences. Construct the quantity Mi to 
be the Cauchy limit of this sequence so that 

lim (M im ,Y) - {Mi,Y) Vz, VF G H (28) 

m — i-oo 

Linearity of E(-) gives E(Mi m ) = e, Vi,m, and hence applying ( ps| ) for Y G C we deduce E(M^) = e,. 

Define Rik via = — Mi Vi, so that E(i?i m ) = Vz, to. The other properties of the representation 

follow directly from (^8|). □ 

As for the case of second-order exchangeability, the mean components of the representation, Mj, represent 
the quantities which we may learn about by linear fitting on the data. We may resolve as much uncertainty as 
we wish about these quantities given a sufficient number of observations, by such linear fitting. We therefore 
say that the n-step exchangeable collection {Yj k \ij, k} with representation Yjk = Mj + Rjk Vj, k identify 
the random quantities Mj, Vj. 

4.3 Identification of the covariance structure underlying the DLM 

The n-step exchangeability representation theorem allows us to construct models for the observable quadratic 
products we have formed. The elements of the collection {X' t X' t T \\/t > 2} for the two-step invertible DLM, 
are 2-step exchangeable over t. From Theorem |l|, we may construct the representation (psj). The identified 
quantities may be constructed as the Cauchy limit of the arithmetic means of the elements. 

Lemma 2 The 2-step exchangeable collection {X^X'^-^t > 2} identify the matrix 

M' = F T V UJ F + V v + HV U H T (29) 
and the 3-step exchangeable collection {X"X" T |V£ > 3} identify 

M" = F T GV U G T F + F T V UJ F + V v + H 2 VH 2T (30) 
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Proof 



M' = lim iyi!l! T (31) 

t=2 
1 N 

lim ^Y J (F T uJt + iyt-Hiy^ 1 )(F T u; t + ^ t -H^ 1 ) T (32) 

.'V — » rv~i /V ' * 



N->oo TV 

t=2 

JV 

:\' -'"x. A' 



lim — V F T u t Ut T F + v t v t T + Hv t -iVt-i T H T (33) 

1=2 

= F T V U F + V + HV"H T (34) 
The derivation of M" is similar. □ 

Now since \ [HM'H T - (M" - M 1 )] = HV V H T we deduce that for a 2-step invertible series, the collec- 
tion 

| \ [HX' t X?H* - (X'lXT X' t X?)} | W > 3} (35) 

identifies HV U H T . 

Now if r > p, H can usually be chosen to be invertible. If r < p, since we assume that F and G 
are of full rank, so is H, and so H is invertible. In this paper we restrict attention to those two-step 
invertible DLMs which have an invertible H. Consequently, since \ \M' - H- 1 (M" - M')H~ 1T ] = V u and 
M' - V v - HV V H T = F T V"F we get 

Theorem 3 For a 2-step invertible series with invertible H , put 

M t = \ [X'XiJ H-\X'lX<r - X' t X'?)H-^} (36) 

(i) The collection of matrices {M t |Vt > 3} identify V v . 

(ii) The collection of matrices {X' t X' t T - M t - HM t H T \ Vt > 3} identify F T V UJ F. 

□ 



If p > r, we do not identify the entire matrix V w . This would require a fuller selection of observables 
than we have considered here. The identified matrix, F T V U1 F, is the contribution to the uncertainty for X t 
from ® t , given &t i- 

4.4 Example 

In our example, Theorem || implies that the collection {x[ 2 ^ X^ 1 - — X^ Xj 1 ' )T |V£ > 3} identify V w and 
that the collection {X { p x[ 1)t - ±Xp } Xp )T |Vi > 3} identify V v . 
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By observing sales at an increasing (but finite) number of time points, we may resolve through linear 
fitting, as much uncertainty as we wish about the underlying covariance structure for the particular time 
series model we are dealing with. 

If all fourth order prior belief specifications have been made, a simple Bayes linear analysis can be carried 
out in order to learn about the underlying covariance structure by adjusting the elements of V v , by 
the elements of the observable matrices X('x, , xf^xf^. However, for long, high-dimensional time 
series, the number of quantities involved in a full linear adjustment is extremely large, and so it is important 
to reduce the dimensionality of the problem, and preserve the inherent matrix structure. 



5 Matrix objects for the time series 
5.1 Formation of the matrix space 

We are interested in learning about r x r dimensional covariance matrices. We first form a collection of 
r 2 linearly independent r x r constant matrices such that C r ti_-\\+j is the matrix with a 1 in the (i,j) th 
position, and zeros elsewhere, where i and j range from 1 to r and call this collection C — [Ci, . . . ,C r ^\. 
This collection of matrices is a basis for the space of known r x r matrices. Define the collections of matrices 

Xj = {X' 2 X' 2 \ HX' 2 X' 2 T H T , H-'X' 2 X' 2 T H-^} (37) 

Xj = {X' t X' t T , X'lX'l T , HX' t X' t T H T , H- 1 X' t X' t T H- yr , H^X'^X'^ H^}, it > 3 (38) 
Form the real vector space, Af whose elements are linear combinations of random r x r matrices as follows. 

X=span{c,Xixi...} (39) 

Define an inner-product on Af via 

(A,B) = E(Tr[AB T ]), VA B e Af (40) 



This inner- product is discussed and motivated in |Wilkinson and Goldstein (1995 ). Complete Af into a Hilbert 



space, A4. When the space is completed, limit points such as HV U H T , V" , and F T V U F are added to the 
space. The inner-product on this space is determined by our beliefs about the quadratic products, since 

r r 

(A, B)=J2Y1 [Cov(A jk ,B jk ) + E(A jk )E(B jk )} VA, B e M (41) 

] = 1 k=l 

We may carry out Bayes linear adjustment in this space by orthogonal projection of the matrices of interest 
into subspaces of observable matrices. Our adjusted expectations for these matrices are linear combinations 
of the prior matrices and the observable matrices. Note that this matrix approach to belief adjustment is 
a more direct way of getting at desirable linearity properties of conditional expectations for matrices, than 
via somewhat artificial constructs such as the matrix normal distribution. 
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5.2 Example 

For our example, we simply construct 

M = s P an{C, X«X« T , X«X« T , . . . , X< 2) X< 2)T , X™ T , . . .} (42) 

and impose the inner-product (p{)|), inducing the Hilbert space .M, which contains limit points such as V" 
and V u . Note that in order to evaluate (fhj), the specifications needed are precisely those which were made 



in Section 3.2. The fact that many other aspects of the fourth order specifications are not necessary is 
very helpful, as this greatly reduces the specification burden. Often it is most straightforward to make 
direct primitive specifications for the matrix object inner-product. However, for simplicity in this paper, we 
have built up the specifications for the matrix inner product from specifications over the scalar quadratic 
products, thus establishing the links between the scalar and matrix analysis. This is analogous to specifying 
an expectation of a random quantity by breaking it up over a partition of events and specifying probabilities 
over the partition. 

5.3 n-step exchangeable matrix objects 

The definition of generalised n-step exchangeability applies directly to matrix objects in the space M.. The 
collection of matrix objects {X' t X' t T \\/t > 2} is 2-step exchangeable in the space Ai, and the collection 
{X "X'/ T |Vi > 3} is 3-step exchangeable. This leads to a restatement of Theorem || for matrices in the 
space Ai. The limit points are the matrices of limit points of their elements, due to the consistency of the 
inner-products on the scalar and matrix spaces. 

Theorem 4 Put M t = \ [X' t X' t T - H-^X'W* - X' t X' t T )H- 1T ] . 
(i) The collection {Mt\ Vf > 3} identifies V v in M.. 

(it) The collection {X' t X' t T - M t - HM t H T \\ft > 3} identifies F T V"F in M. 

□ 

Now, for any subspace D such that C C D C Ai, we define adjusted matrix expectation, Ed : M. — > D 
to be such that VY 6 Ai, Ed(Y) is the orthogonal projection of Y into D, with respect to the inner-product 
(•, •). If D = span{C} then we write E(-) for E£>(-), as this is the usual matrix expectation, consisting of the 
matrix of expectations of elements. 
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5.4 Adjustment 



Suppose that we are considering observing n > 3 time points in the series. Form the matrix space, A4, and 
the observable subspace D n C M 



D n = 8pan{C,X*,X*,...,X*} 



(43) 



Then the adjusted expectation map, E : M — > D„, is the orthogonal projection into the D n space. 
In particular, we evaluate Ej^V"), "Ex) n (F T V u F) ) which are matrices in the D n space. These adjusted 
expectations are analagous to posterior expected values for the matrix objects, taking into account only 
those limited aspects of the problem which we have considered. The general relationships between adjusted 



and posterior beliefs is discussed in |Goldstein (1994| ) 



6 Bayes linear adjustment for the example 
6.1 The adjusted covariance matrices 

Adjustments were carried out using 17 time points from the actual time series. The matrix objects V u and 
V v were adjusted in the following ways: 
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-4 
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56.6 


-5.6 






4.8 


-4.7 -3.2 


-3 


5 


-5.6 


34.9 






4.9 


-2.4 -4.1 


-7 


5 


4.8 


-4.9 




44.0 J 



(44) 



(45) 



Comparing these with their prior specifications, given in (^|) and (|Io|), we see that the adjusted matrices are 
perturbations of the prior expectations for the matrices. Notice that the variance associated with the fourth 
variable has been inflated considerably in both matrices. The sample variances for the 17 cases of the six 
brands we considered were 167, 22, 37, 560, 18 and 427. Informally, it seems that there may indeed be more 
variability associated with the fourth (and sixth) variable. 



6.2 First order adjustment 

Since our aim is to predict sales more accurately, a sensible test of the procedure is to compare the perfor- 
mance of the first order model, (||), (0), using both the prior and adjusted covariance matrices. We find that 
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the Bayes linear diagnostic warnings (the size and bearings of the adjustments, as described in poldstciri 
(198S| )) are noticeably closer to their expected values when using the adjusted matrices. For the given ex- 
ample, most of the size ratios for adjustments of the first order structure were noticeably closer to one using 
the adjusted covariance structure to predict future values, suggesting that the adjusted matrices match more 
closely with the forecast performance of the model. 



7 Conclusions 

Good forecasting requires careful updating of the covariances within the time series structure. Informally, 
the degree of shrinkage between the prior and the data is updated, and relationships between variables are 
properly taken into account. Since we are able to adjust the covariance matrices for both the observational 
as well as state residuals, we are able to properly understand the competition and demand effects taking 
place within the series. By taking a matrix object approach, we greatly simplify the problem by reducing 
dimensionality. This is important for both simplifying belief specification and belief adjustment, and also for 
interpretation of the structure of the adjustment and accompanying diagnostics. There are also the general 
advantages of the Bayes linear approach; namely of allowing complete flexibility for the prior specifications, 
without placing distributional restrictions on the data or model components. 
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Appendix 

Covariance structure for the example 

We give here the full covariance structure over the quadratic products of the one and two step differences, 
defined by (|il|), j2tj|). First we need some notation for two different "direct products" of matrices. 



Definition 5 For r x r matrices A (having entry a,ij in row i, column j ) and B (having entry bij in row i, 
column j) we define the (left) tensor product, A® B of A and B to be the r 2 x r 2 matrix with the element 
o-jkbim in row r(l — 1) + j , column r(m — 1) + k. 
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Definition 6 For r x r matrices A (having entry dij in row i, column j) and B (having entry bij in row i, 
column j) we define the star product A * B of A and B to be the r 2 x r 2 matrix with the element ajkhm in 
row r(l — 1) + j, column r(k — 1) + m. 

Given these definitions, the covariance structure over the quadratic products of the 1-step differences is 
determined by the following relations: 

Cov(vecl/^vec(J^ 1) J^ 1)T )) = Var(vec^) (46) 

Cov(vecF ,y ,vec(x| 1) x| 1)T )) = 2Var(vecy) (47) 

Cov(vec(x| 1) x| 1)T ),vec(x| 1) x| 1)T )) = Var(ve C y w ) + 4Var(vecV I/ ) 

+ Va^vecS^) + Va^vecS 1 ^) 

+ Var(vecS?) + 2[E(V ^V^+EiV" *V»)\ (48) 

+ 4[E(V U )®E(V UJ ) +E(y y )*E(y w ) 

+ E(V)®E(V ,y ) + E(V^)*E(V a O] 

Cov(vec(x[ 1) x[ 1)T ),vec(x[ V, 1 X ( t 1 \ T )) = 4(Var(vecV") + Var(vecV w )) + Var(vec5' t l/ _ 1 ) (49) 
Cov(vec(x| 1) x| 1)T ),vec(x| 1 _ ) s X^ ) s T )) = 4Var(vecV I/ ) + Var(vecV™) Vi,Vs > 2 (50) 
The covariance structure over the quadratic products of the 2-step differences are given below. 

Cov(vecV",vec(X ( ? ) X ( j 2)T )) = 2Var(vecV) (51) 

Cov{vecV v , vec(xf ) X ( t 2)T )) = 2V&r(vecV") (52) 

Cov(vec(xf } xf )T ), vec(xj 2) xf )t )) = 4Var(vecV^) + 4Var(vecF ,y ) + Vax(vecS , t ") 

+ Var(vecS t "_ 2 ) + Var(vecS^) + Var(vec5 t a i 1 ) 
+ 2[E(V ®V V ) + E{V V *V V ) + E(V U ®V") 
+ E(V W * V")] + 4[E(V) <g> E(y^) + E(V") * E(F W ) 
+ E(V w )<g.E(V I/ )+E(V a ')*E0/ I/ )] 

(53) 

Cov(vec(xf } xf )T ), vec(x[ 2 _ ) 1 x| 2) 1 T )) = 4[Var(vecV") + Var(ve C y w )] + Var(vecS' t tJ _ 1 ) (54) 
Cov(vec(xf ] xf )T ), vec(X^ 2 _ ) 2 X^ 2 _ ) 2 T )) = 4[Var(vecV r,y ) + Var(vecV"" )] + Var(vecS t "_ 2 ) (55) 
Cov(vec(xf >xf )T ),wec{xf} s Xf} s T )) = 4Var(vecV I/ ) + Var(vecV™) Vi,Vs > 3 (56) 
The covariances between the one and two step differences are determined as follows: 

Cov(vec(x| 1) x| 1)T ), vec(x\%x\% T )) = W&r(vecV v ) + 2Var(vecV* ; ) Vt,Vs > 3 (57) 

Cov(vec(x\ 1) x\ 1)T ), vec(x|+ ) 2 x[+ ) 2 T )) = 4Var(vecV") + 2Var(ve C y w ) + Var(vecS , t _ 2 ) (58) 

Cov(vec(x| 1) x| 1)T ),vec(x|+ ) 1 x|+ ) 1 T )) = 2Var(vecF w ) + 4Var(vecV I/ ) 

+ Var(vec^_ 2 ) + Var^ec^J (5Q \ 

+ E(V I/ )<g.E(V w )+E(V i ')*E(V u ') 1 ' 

+ E(V") ®E(V") + E(V U} )*E(V 1 ') 
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Cov(yec(x[ 1) X ( t 1)T ),wec(x[ 2) x[ 2)T )) = 2Var(vecV™) + 4Vax(vecV) 

+ E(V' J )(g>E(\/ , ')+E(V™)*E(V < ') 
Cov(vec(Xf } xf )T ), vec(X^ 2) 1 x| 2) 1 T )) = 4Var(vecV) + 2Var(vecV") + VarfvecS^) (61) 

Cov(vec(xf ) Xf )T ), vec(X^ 2) s X t ( 5 s T )) = 4Var(vecy' y ) + 2Var(vecF £ ") Vt, Vs > 2 (62) 

These results are obtained by focussing on a general element of a matrix on the left hand side, and then 
substituting into the left hand sides the definitions (|l9|) and (|20|), expanding the covariances, substituting 
representations (|l^) and (|l^), and then simplifying the result using known orthogonalities to deduce the 
general element of the matrices on the right hand side. However, there are several hundred terms in some of 
the expansions and a computer algebra package was used to ensure the accuracy of the results. 
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